Developing Bioinformatics Software: Continuous Integration

R
Python
Bioinformatics
Software Development
Author

Kelly Sovacool

Published

2023-03-15

A typical git workflow

The current best practice for using git to manage collaborative software projects is known as trunk-based development. Under this model, small changes are frequently made in different branches, then merged into the main “trunk” (i.e. the main or master branch) of the repo after passing peer review. The steps look like this:

  1. An issue is opened
    • a developer or user notices a bug, requests a feature, or asks a question.
  2. Engage in the issue comments
    • to clarify the issue, ask for a reproducible example, etc.
  3. Work on the issue
    1. create a new branch and switch to it.
    2. write tests that will pass when the issue is resolved.
    3. write or edit code to resolve the issue.
    4. (possibly) write more tests to make sure edge cases and failure modes are handled.
    5. write/update documentation if needed.
    6. make sure your tests pass and the package still builds.
  4. Create a pull request
    1. assign or request a reviewer.
    2. the reviewer reviews your code.
    3. you make any requested changes.
    4. the reviewer approves your pull request once they’re happy with it.
    5. merge the pull request.
  5. Celebrate that you resolved an issue!

You can have multiple issues open at any stage of the process at a time. You might start working on a feature, switch to fixing a time-sensitive bug and resolve it, then later go back to working on that feature. Meanwhile, collaborators are working on other issues too! This process enables highly collaborative and asynchronous work. Making changes in separate branches and merging them into main only after testing and peer review helps ensure that only high quality code is adopted.

Continuous integration: what & why?

git + ci = magic ✨

It would be a bummer if you or a collaborator forgot a crucial step of the process, like running the unit tests or linting your code, and accidentally merged buggy/broken/bad code into the main branch of your project. The good news is: You don’t have to remember everything! Let the machines do it for you automatically!

Continuous integration is a practice where tests and other code quality checks are automatically run before code changes are merged into the main branch.

How does this modify our git workflow? When we open a pull request or push a commit to main, the CI service will run a workflow we define to run our checks, so we don’t have to do it manually!

CI service options
  • GitHub Actions
  • Travis
  • Jenkins
  • CircleCI
  • Azure DevOps

We’ll use GitHub Actions because it’s easy to setup, you’re alraedy using GitHub for your projects, and they provide a lot of computing resources for free.

Building a CI workflow with GitHub Actions

We’re going to create a CI workflow that runs on all pushes and pull requests to the default branch (typically “main” or “master”). Workflows are defined with YAML files to specify how to configure the machine that runs the workflow, install dependencies, and run commands.

Let’s start by creating a small workflow that prints “Hello, world!” and lists the files in the package.

I will demonstrate with two example packages: bionitio-r and bionitio-python.

Configure permissions

Before we can get started using GitHub Actions, we’ll need to make sure we configure our repo settings to allow Actions to run and push changes.

Enable Actions

On Github.com, go to your repository, click Settings and under ‘Code and automation’ click Actions -> General. Under ‘Actions permissions’, select Allow all actions and reusable workflows and click Save.

Allow Actions to read & write

Next, scroll down to the bottom of the page. Under ‘Workflow permissions’, select Read and write permissions and click Save.

Now we’re ready to start using GitHub Actions for our projects!

Getting started

Every GitHub Actions workflow resides in .github/workflows/ and needs:

  • on – events that trigger the workflow
  • jobs – list of independent jobs each with steps to run in sequence.
  • steps – each step specifies a third party action to uses with uses, or specifies shell code to run.

Workflow 1: .github/workflows/greet.yml

# name of the workflow
name: greet

# when the workflow should run
on:
  push:
    branches:
      - main
      - master
  pull_request:
    branches:
      - main
      - master

# independent jobs in the workflow
jobs:
  # this workflow just has one job called "greet"
  greet:
    # the operating system to use for this workflow
    runs-on: ubuntu-latest
    # list of steps in the workflow
    steps:
      # use an action provided by github to checkout the repo
      - uses: actions/checkout@v3
      # a custom step that runs a couple shell commands
      - name: List
        run: |
            echo "listing files in the bioinitio directory"
            ls bionitio
      # a custom step that runs R code
      - name: Greet
        run: print("Hello, world!")
        # Replace `shell: Rscript {0}` with `shell: python {0}` to run Python code instead!
        shell: Rscript {0}
Create the “Hello world” action
  1. Open an issue with the title “Set up continuous integration”.
  2. Switch to a new branch called ci.
  3. Add the workflow (Workflow 1) to your repo at .github/workflows/greet.yml.
  4. Replace bionitio with the name of your package.
  5. Commit and push it to GitHub.
  6. Finally, open a pull request from your new branch into main.

Did your workflow run? On GitHub, go to the Actions tab of your repo. Opening the pull request should have triggered the workflow to run.

Once the workflow finishes (about 15 seconds), it will either have a green checkmark (✅) for success or a red X (❌) for failure.

Click on the workflow run. Then under ‘jobs’, click on the job ‘greet’. You’re now viewing the log file for the job. You can click on the arrows to expand the details for each step.

You can also see the status of the action from the Pull Request summary page. Keep your pull request open. We’re going to continue pushing commits to the ci branch as we add new steps to the workflow.

greet status

In Slack, react with ✅ or ❌ to indicate the status of your workflow.

Test suite

This initial “hello world” workflow is cute, but not very useful. Let’s edit the workflow to run our test suite for us automatically!

R

use devtools::test() to run just the tests, or devtools::check() to run all checks for CRAN.

test R .github/workflows/ci.yml
name: CI

on:
  push:
    branches:
      - main
      - master
  pull_request:
    branches:
      - main
      - master

jobs:
  build:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      R_KEEP_PKG_SOURCE: yes
    steps:
      - uses: actions/checkout@v3
      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true
      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::rcmdcheck
          needs: check
          working-directory: bionitio
      - name: Check
        uses: r-lib/actions/check-r-package@v2
        with:
          args: 'c("--no-manual", "--as-cran")'
          working-directory: bionitio
Repo tree

The r-lib actions assume that the top level of your repo is the same as the top level of your R package. If that’s not the case, you’ll need to specify the working-directory.

For my example project, bionitio-r is the top level of the git repo, and from there the R package resides in bionitio:

bionitio-r
├── README.md
├── .github
│   └── workflows
│       └── ci.yml
├── bionitio
│   ├── DESCRIPTION
│   ├── R
│   │   ├── bionitio.R
│   │   └── file_utils.R
│   └── tests
│       ├── testthat
│       │   └── test-stats.R
│       └── testthat.R
Ignore check dir

The check-r-package action creates files in /package/check/. We don’t want git to track them, so we need to add the check dir to the gitignore file:

R .gitignore
.Rproj.user
.Rhistory
.RData
.Ruserdata
check/

Python

Use pytest to run the test suite.

test Py .github/workflows/ci.yml
name: CI

on:
  push:
    branches:
      - main
      - master
  pull_request:
    branches:
      - main
      - master

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python 3.11
      uses: actions/setup-python@v3
      with:
        python-version: "3.11"
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pytest
        if [ -f requirements.txt ]; then
            pip install -r requirements.txt
        fi
    - name: Test with pytest
      run: |
        pytest .

In each of these workflows, the action checks out the repo, installs R or Python, installs the dependencies of the package, then runs the tests. If any of your tests fail, the whole actions workflow will fail too.

Dependencies

Have you been keeping track of your package’s dependencies? Be sure to add them according to the instructions below! If you need a certain minimum version of a package, you can specify the version number with PACKAGE >= VERSION, e.g. biopython >= 1.70.

R imports

If your package depends on any other packages, you need to add them to DESCRIPTION under Imports for required dependencies or Suggests if only needed for some functions. bionitio-r needs two R packages, so they’re listed like so:

bionitio/DESCRIPTION
Package: bionitio
Type: Package
Title: Calculate FASTA statistics
Version: 0.1.0
Description: This package reads in one or more input FASTA files and calculates
    a variety of statistics.
License: MIT + file LICENSE
Imports:
    seqinr,
    logging
Suggests:
    devtools,
    testthat

Python requirements

If your package depends on any other packages, you need to add them to requirements.txt. bionitio-python needs one Python package listed like so:

bionitio/requirements.txt
biopython >= 1.70
Testing with CI

In your ci branch, modify your CI workflow to run the test suite, then commit and push your changes. Does the CI workflow succeed or fail?

You may get failures if you haven’t been running your unit tests or tracking dependencies as you develop your code base. Go to the workflow log file and expand the test step to see why it failed. Take a few minutes to open issues for each test that failed. If the problem is with your dependencies, fix them now.

React to the slack message with ✅ when you’re finished opening issues, fixing dependencies, or now if the workflow completed successfully.

Workflow status badges

Each Actions workflow has a status badge that indicates whether the action is passing or failing. You may have come across status badges in GitHub README files of packages you use. Putting a CI status badge in the README file is a popular way for project maintainers to prominently display that CI is set up and it’s working!

Add the workflow status badge to your README

Under the Actions tab, click the name of the workflow (e.g. ci), click the triple dots menu (...) in the upper right corner, and select Create status badge.

In the pop-up menu, click Copy status badge Markdown, paste it into your README.md file, then commit and push your change on the ci branch.

React to the slack message with ✅ when you’re finished.

Now anyone who takes a look at your README file will see that your project uses continuous integration!

Lint and style code

Many large software projects follow a specific coding style guide to make sure their code base is consistent and easy to read.

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.

The Tidyverse Style Guide

A linter checks your code to make sure you conform to a style guide and raises warnings if your code doesn’t conform. A code formatter or styler modifies your code to make it conform to a style guide. There is a lot of overlap in the problems that linters and formatters can catch. However, linters additionally warn about not only style problems but also more serious problems like syntax errors.

language linter formatter
R linter styler
Python flake8 black

When adding these tools to CI, make sure you run the formatter before the linter, so the linter will only complain about problems that the formatter can’t fix. Since the code formatter modifies our code, we will also need to commit and push the code changes using the GitHub Actions bot as the author.

style R .github/workflows/ci.yml
name: CI

on:
  push:
    branches:
      - main
      - master
  pull_request:
    branches:
      - main
      - master

env:    # configure environment variables for git commits
  actor: "41898282+github-actions[bot]"

jobs:
  build:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      R_KEEP_PKG_SOURCE: yes
    steps:
      - uses: actions/checkout@v3
      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true
      - uses: r-lib/actions/setup-r-dependencies@v2
        with:  # also install styler & lintr
          extra-packages: any::rcmdcheck, any::styler, any::lintr
          needs: check
          working-directory: bionitio
      - name: Configure git # use the environment variable we set above
        run: |
          git config --local user.email "${actor}@users.noreply.github.com"
          git config --local user.name "$actor"
      - name: Check
        uses: r-lib/actions/check-r-package@v2
        with:
          args: 'c("--no-manual", "--as-cran")'
          working-directory: bionitio
      - name: Style & lint
        run: |
            styler::style_dir(".")
            lintr::lint_dir(".")
        shell: Rscript {0}
      - name: Commit and push changes
        run: |
          git add .
          git commit -m "🎨 Style code" || echo "No changes to commit"
          git push
style Py .github/workflows/ci.yml
name: CI

on:
  push:
    branches:
      - main
      - master
  pull_request:
    branches:
      - main
      - master

env:    # configure environment variables for git commits
  actor: "41898282+github-actions[bot]"

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python 3.11
      uses: actions/setup-python@v3
      with:
        python-version: "3.11"
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pytest black flake8 # also install black & flake8
        if [ -f requirements.txt ]; then
            pip install -r requirements.txt
        fi
    - name: Configure git  # use the environment variable we set above
      run: |
        git config --local user.email "${actor}@users.noreply.github.com"
        git config --local user.name "$actor"
    - name: Test
      run: |
        pytest .
    - name: Format & lint
      run: |
        black . # first run black, then run flake8
        flake8 --extend-ignore E203 --max-line-length 88 .
    - name: Commit and push changes
      run: |
        git add .
        git commit -m "🎨 Style code" || echo "No changes to commit"
        git push
Note

flake8 is not 100% compatible with black by default. Here we direct flake8 to ignore one of its errors (--extend-ignore E203) and increase the maximum allowed line length (--max-line-length 88) to make flake8 compatible with black.

There are several key changes we made to this workflow to make sure our styling and linting would work:

  1. Set a global environment variable called actor with the username of the GitHub Actions bot.
  2. Configure the git username and email to point to the GitHub Actions bot, using the environment variable we created as above.
  3. Install additional dependencies for linting and formatting the code.
  4. Run the code formatter and linter.
  5. Commit any changes and push to origin.
Style & lint your code

Modify your workflow to style and lint your code, and see what happens when you push it to GitHub.

Does the linter raise any errors? If so, take a moment to open issues for the errors you need to fix. React to the slack message with ✅ when you’re finished opening issues or now if your code is already lint-free.

Are your tests failing?

If you have some failing tests, the workflow will fail before it gets to the lint & style step. You can temporarily comment-out any failing steps with hashes (#) so you can continue through this tutorial, but don’t forget to uncomment these lines later!

Code coverage

Code coverage is the percentage of your source code that is covered by unit tests. Generally the higher the code coverage, the better. It can be a useful metric to see where there are holes in your tests.

Codecov.io is a free tool for open source projects that pairs nicely with GitHub Actions for generating code coverage reports! Let’s set it up now.

login to codecov

Go to https://about.codecov.io/ and Login with GitHub. If this is the first time you’re connecting Codecov and GitHub, you may need to grant Codecov permission to read your repositories.

Once you’re logged in, you should see a list of all your GitHub repos (and maybe also those of any organizations you’re a member of). Scroll down to your repo for this class and click setup repo.

Follow the instructions on the next page to set up code coverage for your repo. Just do Step 1 and Step 2 now; we need to make some custom modifications to Step 3 for our projects.

Warning

Your CODECOV_TOKEN should be kept secret. Don’t paste it anywhere except for in your repository’s Actions secrets.

Codecov setup

React to the message on Slack with 1️⃣ for Step 1 and 2️⃣ for Step 2 once you complete them. Don’t do the other Steps yet.

Add codecov to the workflow

We’ll need to make sure the test suite generates a report that codecov can ingest.

For Python, you’ll need to install an additonal plugin called pytest-cov and set the --cov flag when you run pytest. You can paste the codecov step anywhere after the test step of your workflow.

Running pytest with --cov generates an xml file. You don’t want to track that with git, so add coverage.xml to your gitignore file:

Py .gitignore
__pycache__/
coverage.xml
codecov Py .github/workflows/ci.yml
name: CI

on:
  push:
    branches:
      - main
      - master
  pull_request:
    branches:
      - main
      - master

env:
  actor: "41898282+github-actions[bot]"

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - name: Set up Python 3.11
      uses: actions/setup-python@v3
      with:
        python-version: "3.11"
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pytest pytest-cov black flake8 # add pytest-cov plugin
        if [ -f requirements.txt ]; then
            pip install -r requirements.txt
        fi
    - name: Configure git
      run: |
        git config --local user.email "${actor}@users.noreply.github.com"
        git config --local user.name "$actor"
    - name: Test
      run: |
        pytest --cov=bionitio tests/ # specify your package & test paths
    - name: Upload coverage reports to Codecov
      uses: codecov/codecov-action@v3
    - name: Format & lint
      run: |
        black .
        flake8 --extend-ignore E203 --max-line-length 88 .
    - name: Commit and push changes
      run: |
        git add .
        git commit -m "🎨 Style code" || echo "No changes to commit"
        git push

For R, the covr package runs the test suite, generates a report, and uploads it to codecov all with one function. You don’t need codecov’s Action as in the Python workflow, because covr handles that for you.

codecov R .github/workflows/ci.yml
name: CI

on:
  push:
    branches:
      - main
      - master
  pull_request:
    branches:
      - main
      - master

env:
  actor: "41898282+github-actions[bot]"

jobs:
  build:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      R_KEEP_PKG_SOURCE: yes
    steps:
      - uses: actions/checkout@v3
      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true
      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::rcmdcheck, any::styler, any::lintr any::covr
          needs: check
          working-directory: bionitio
      - name: Configure git
        run: |
          git config --local user.email "${actor}@users.noreply.github.com"
          git config --local user.name "$actor"
      - name: Check
        uses: r-lib/actions/check-r-package@v2
        with:
          args: 'c("--no-manual", "--as-cran")'
          working-directory: bionitio
      - name: Style & lint
        run: |
            styler::style_dir(".")
            lintr::lint_dir(".")
        shell: Rscript {0}
      - name: Commit and push changes
        run: |
          git add .
          git commit -m "🎨 Style code" || echo "No changes to commit"
          git push
      - name: Test coverage
        run: covr::codecov(path = "bionitio")  # set your package path here
        shell: Rscript {0}

Once your modified CI workflow has completed successfully with the new codecov step, you’ll be able to view coverage reports for your repo on codecov.io and see them in your pull requests. However, if your tests aren’t passing, codecov won’t be able to generate a report.

Codecov report

React to the Slack message with:

  • ✅ once your CI workflow completes successfully with codecov and you can see the coverage report.
  • 🔨 if you need to fix your tests before the workflow can complete.

Codecov status badge

Codecov has a nifty status badge that we can display in our README file too.

Copy and paste the following into your README.md, then replace GITHUB_USERNAME and GITHUB_REPO in both the image and link URLs.

[![codecov](https://codecov.io/gh/GITHUB_USERNAME/GITHUB_REPO/branch/main/graph/badge.svg)](https://codecov.io/gh/GITHUB_USERNAME/GITHUB_REPO)

Here’s what my project README looks like now:

README.md
# bionitio-python

[![ci](https://github.com/kelly-sovacool/bionitio-python/actions/workflows/ci.yml/badge.svg)](https://github.com/kelly-sovacool/bionitio-python/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/kelly-sovacool/bionitio-python/branch/main/graph/badge.svg)](https://codecov.io/gh/kelly-sovacool/bionitio-python)

The color of the badge changes from red to green as coverage increases. If your tests aren’t passing or the codecov upload action didn’t work, it will display unknown as the coverage for now.

Codecov badge

React to the Slack message with ✅ once you’ve added the codecov badge to your README.

Interpreting code coverage

Generally, higher code coverage is better. However, a code coverage of 100% doesn’t guarantee that your package doesn’t have any bugs, it only means that every line of code is run at least once by your test suite. Not all code strictly needs to be tested; very few software projects have a code coverage anywhere close to 100%. Focus on writing unit tests that test your assumptions about how your code works, and prioritize testing the most important components of your project.

Wrap-up

If you commented-out any failing steps in your CI workflow (e.g. if unit tests failed), uncomment those now. Before working on any other issues, you should fix those issues.

Once your CI workflow has completed successfully on the ci branch, assign your partner to review the Pull Request. PR reviewers: only approve your partner’s PR if the CI workflow is working! You can merge your PR into main if the CI workflow completes and the reviewer approves the PR.

Ideally, you should make all changes in a branch that’s separate from main, then open a PR once you think your code resolves the issue you’re working on. Your CI workflow will then run the code quality checks we explored above. If any CI steps failed, fix your code until they succeed. Finally, have another person review your PR to check for things that the CI workflow can’t, like whether your code appropriately solves the problem or implements the feature you set out to address. Following this protocol helps assure that all changes follow best practices in software engineering before adopting them into the code base.

Resources

We only scratched the surface on what you can do with continuous integration services and GitHub Actions specifically. Here are related topics and other resources to explore.